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Abstract- A bus system that can change dynamically to suit computational needs 
is referred to as reconfigurable. We present a fast adaptive convex hull algo- 
rithm on a 2-dimensional processor array with a reconfigurable bus system 
(2-d PARBS, for short). Specifically, we show that computing the convex hull 
of a planer set of n points taken O(logn/logm) time on a 2-d PARBS of size 
mn x n with 3 < m < n. Our result implies that the convex hull of n points in 
the plane can e computed in 0(1) time in a 2-d PARBS of size n 1-5 x n. 

1 Introduction 

Recent advances in VLSI have made it possible to build massively parallel machines featur- 
ing many thousands of cooperating processors. This increase in computational power does 
not, however, translate into increased performance of the same order of magnitude. One 
of the reasons seems to be that interprocessor communications and simultaneous memory 
accesses often act as bottlenecks in parallel machines. 

To alleviate the inefficiency of long distance communication among processors, bus 
systems have been recently added to a number of parallel machines [2-4,5,6,11]. If such a 
bus system can be dynamically changed, under program control, to suit communication 
needs among processors, it is referred to as reconfigurable. Examples include the bus 
automaton [11], the reconfigurable mesh, and the polymorphic torus [2,3], among others. 

The computational model used throughout this work is the reconfigurable mesh [5]. 
An m x n reconfigurable mesh (also called a PARBS [13]) consists of m X n identical 
processors positioned on a rectangular array (refer to Figure 1). The processor at (i,j), 
(1 < i < m; 1 < j < n) is identified by P(i, j). Every processor has 4 ports denoted by N, 
S', E, and W. There are also implicit north , southy easty and west directions (refer to Figure 
1). In each processor, ports can be dynamically connected in pairs to suit computational 
needs. In the absence of these local connections, the PARBS is functionally equivalent to 
the mesh connected computer. 

1 This work was supported by NASA under grant NCC1-99 by the National Science Foundation under 
grant CCR-8909996 is gratefully acknowledged 
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Figure 1: A 4x5 P ARBS 

We assume that each processor has a small number of registers of size O(logn) bits and 
that a processor can perform in unit time standard arithmetic and boolean operations. We 
assume a single instruction stream: in each time unit the same instruction is broadcast 
to all processors, which execute it and wait for the next instruction. Each instruction 
can consist of setting local connections (as explained later), performing an arithmetic or 
boolean operation, broadcasting a value on a bus, or receiving a value from a specified 
bus. The regular structure of the PARBS makes it suitable for VLSI implementation. In 
fact, it has been argued [5] that the PARBS can be used as a universal chip capable of 
simulating any equivalent-area architecture without loss of time. 

By adjusting the local connections within each processor several subbuses can be es- 
tablished. We assume that the setting of local connection is destructive in the sense that 
setting a new pattern of connections destroys the previous one. At any given time, only 
one processor can broadcast a value onto a bus. Processors, if instructed to do so, read 
the bus. If no value is being transmitted on the bus, the read operation has no result. 
It is assumed [5,6] that communications along buses take 0(1) tim e. This seems to he a 
reasonable assumption in the light of recent experiments with the YUPPIE system [4], 

A number of problems have been solved in 0(1) time on PARBS. Very recently, Wang 
et al [13] have proposed 0(1) algorithms for the transitive closure and some related graph 
problems; Olariu, Schwing, and Zhang [9] have proposed an adaptive sorting algorithm; 
specifically, they show that sorting a sequence of n reals takes 0(j^--) time on a 2-d 
PARBS of size nm x n with 3 < m < n. In particular, their result implies a constant-time 
sorting algorithm on an n 1,5 x n 2-d PARBS. 

The convex hull of a set of points in the plane is defined as the smallest area convex 
set that contains the original set. The problem of computing the convex hull of points 
in the plane is central in a variety of problems in pattern recognition, computer graphics, 
statistics, and image processing [1,7,8,10]. 
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To the best of our knowledge, no convex hull algorithm has been reported in the lit- 
erature on a 2-d PARBS. The purpose of this paper is to propose a fast adaptive convex 
hull algorithm for a set of n points in the plane. We reduce the problem of computing 
the convex hull of a set of planar points to the problems of sorting and computing the 
prefix maximum of n real numbers. To begin, we show that the problem of computing the 
maximum of n real numbers can be solved in time on a 2-d PARBS of size m X n 

with 2 < m < n. We also use the fast adaptive sorting algorithm of [9]. What results is a 
fast adaptive algorithm that computes the convex hull of a set of n points in the plane in 
O(j^) time on a 2-d PARBS of size nm x n with 3 < m < n. In particular, for m=n 0-5 
we obtain an 0(1) time convex hull algorithm on a 2-d PARBS of size n 1 * 5 x n. 


2 The stepping stones 

Our convex hull algorithm relies on a number of intermediate results that we present next. 
To begin, we consider the problem of computing the prefix maximum of n reals on a n x n 
PARBS. Specifically, given n real numbers a\ ) < 12 ,..., a n with processor P(l,j) storing 
dj, the problem is to compute maXi<,<j{a t } for all 1 < j < n. Our algorithm involves 
establishing a number of subbuses and broadcasting values along them. The details of our 
algorithm are spelled out by the following sequence of steps. 


Algorithm Prefix-Maximum; 

Step 1. every processor P(i,j) (2 < i < n — 1; 1 < j < n) connects its ports N and 5; 

Step 2. every processor P(l>j) (1 < j < n) broadcasts aj southbound along the vertical 

subbus in column j] 

Step 3* every processor P(i, j) (2 < t < n — 1; 2 < j < i) connects its ports W and E\ 

Step 4. every processor P(j } j) (2 < j < n) broadcasts aj westbound along the horizontal 

subbus in row j ; 

Step 5. every processor P(j,i) (2 < j < n — 1; 1 < i < j) compares a,* and 
if a{ > dj then 

disconnects the horizontal subbus; 
marks itself; 

Step 6. every marked processor P(i,j) broadcasts a ”0” along the horizontal subbus 
eastbound; 

Step 7. every processor P(j,j) (1 < j < n) stores in its own memory a ”0” or a ”1” 
depending on whether or not it has received a ”0” in Step 6; 

Step 8. every processor P(i,j ) (2 < i < j < n) connects its ports N and 5; 
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Step 9. every processor P(j,j) (2 < j < n) broadcasts on the vertical subbus northbound 
the value it has stored in Step 7; 

Step 10. every processor P(l,j) (2 < j < n) that has received a ”0” in Step 9 connects 
its ports W and E; 

Step 11. every processor (1 < j < n ) that stores a ”1” broadcasts aj eastbound 

along the horizontal subbus in row 1; 

Theorem 1. Algorithm Prefix-Maximum correctly computes the prefix maximum of n 
real numbers in 0(1) time on an n x n PARBS, 

Proof. To begin, note that in Step 5, every processor P(i,j ) (2 < i < n — 1; 1 < j < i) 
knows a,- and a.j. Further, it is easy to see that at the end of Step 7 processor P(j,j) 
(1 < j <n) stores a ”1” if, and only if, a j is as least as large as with i < j. 

Consequently, every processor P(l,y) in row 1 that at the end of Step 9 stores a ”0” 
knows that a,- cannot be the prefix maximum of ai for i < j. In fact the prefix maximum 
of the first j real numbers a*, a 2) . . .,a.j is stored by the first processor to the left of P(l, j) 
that stores a ”1”. The conclusion follows. □ 

Next, we show how to compute the maximum of n real numbers aj, a 2 ,. ..,a n on an 
m x n PARBS with 2 < m < n. Again, we assume that the numbers are stored one per 
processor such that for all j (1 < j < n), P(l,j) stores aj. The idea of our algorithm is to 
partition the original m x n PARBS into subPARBS of size m x m. To avoid tedious but 
inconsequential housekeeping details we assume that n is a power of m. 

We partition the n columns into contiguous groups of m columns each and let the k - th 
subPARBS, AT*, (0 < k < n/m — 1) consist of the columns km + 1, km + 2,. . ,,km + m. 
As a preprocessing step, for all j (2 < j < n ) we move the data contained in P(l, j) to the 
’’diagonal” processor of its m x m subPARBS, P((j — 1) mod m + 1, j). The main loop of 
this algorithm applies the (prefix) maximum algorithm described above to specified mxm 
subPARBS. This process proceeds iteratively, determining the maxima of groups of size 
m, m 2 , m 3 , and so on. Clearly, in log m r*— j~^ iteration we have computed the maximum 
of the n numbers. 

We omit the details of bus-construction steps which are similar to those in the previous 
algorithm. The reader can easily fill in the details. 


Algorithm Maximum; 


Step 1. {preprocessing} 

for all j (1 < j < n) in parallel 

establish a vertical subbus from P(l,j) to P((j — 1) mod m + 1 , j); 
P(l, j) broadcasts a.j on this subbus to P((j — 1) mod m f 1 ,j); 
P((j — 1) mod m + 1, j) marks itself 

endfor; 
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Step 2. {main loop} 

for k «- 1 to do 

logm 

for all j (1 < j < ^f) in parallel 

all processors connect ports W and E\ 

all processors P(i , (j — l)m fc + 1) split the horizontal subbus in row i; 
all marked processors broadcast the value they hold 
along the horizontal subbus westbound; 
all marked processors unmark themselves; 

M(j -l)™*-! computes the maximum of the values 

in column ( j — l)m* + 1; 

let the result be stored in P((j — 1) mod m + 1, (j — l)m fc + 1); 
all processors P((j — 1) mod m + 1, (j — l)m fe + 1) mark themselves 

endfor 

endfor; 

Theorem 2. Algorithm Maximum correctly computes the maximum of n real numbers 
in O(j^) time on an m x n PARBS with 2 < m < n. 

Proof. The correctness is implied by the following result: at the end of the <-th iteration 
(0 < * < for all j (1 < j < ~s), processor P((j - 1) mod m + 1 , (j - l)m‘ + 1) 

contains the maximum in columns ( j — 1 )m t + 1 through jm l . 

The proof of the above statement is by induction on t. The basis is easy: at the end of 
the 0-th iteration the conclusion is guaranteed by the preprocessing step. 

Assume the above statement satisfied at the end of the f-th iteration. We only need 
show that it also holds at the end of the (f+l)-st iteration. For this purpose, it is instructive 
to follows the ( t + l)-st iteration: here, after all processors connect their ports W and E 
thus establishing horizontal subbuses in each row, the processors P(i, (j — l)m t+1 + 1) split 
the horizontal subbus in row i; next, all marked processors broadcast the value they hold 
along the horizontal subbus westbound. By the induction hypothesis, these are processors 
P({j — 1) mod m + 1, [j — l)m 4 + 1). Therefore, when the subPARBS M(j compute 

the maximum of the values in column (j — l)m fc+1 + 1, the induction hypothesis guarantees 
that the resulting value is the maximum in columns (j — l)m 1+1 + 1 through }m t+1 , a total 
of m t+l columns. 

To argue for the running time, note that by Theorem 1, the inner for loop runs in 0(1) 
time. The conclusion follows. □ 


3 The Algorithm 

We are nowin a position to present our planar convex hull algorithm. Let 5'={pi, P 2 ,. . -,Pn} 
be a planar set of points; for 1 < i < n,pi is represented by its Cartesian coordinates p, ). 

To avoid tedious details we assume, without loss of generality, that the points in S are in 
general position, with no three collinear and no two having the same x or y coordinate. 
The output of the convex hull algorithm is a linked list CH that contains all the points 
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on the convex hull starting with the one with the largest x coordinate and proceeding 
counterclockwise. Our algorithm consists of the following sequence of steps. 

Algorithm Convex-Hull; 


Step 1. find the four extremal points in S, and let them be, without loss of generality, p x , 
p 2 , p 3 , and pi. Specifically, x 1 =maxi<_ ) < n {x J }, y 2 =maxi {t/y}, x 3 =min 1 < J < n {x J }, 
and y 4 =min 1 < ;I < n {y i }. 

Step 2. compute the sets 

51 = {pi\x 2 < Xi < Xi; yi < yi < y 2 }, 

5 2 = {p,|x 3 < x t - < x 2 ;y 3 < yi < y 2 }, 

-S 3 = {Pi\x 3 < Xi < x 4 ;y 4 < yi < y 3 }, 

S 4 = {pi|a; 4 < Xi < xi ;y 4 < yi < yi}- 

Note: For simplicity, we deal with S 1 only, the others being perfectly similar. 

Step 3. sort the points in Si by increasing y coordinate, and let L 1 =(p 1 =y 1 , q 2} . . ,,g t =p 2 ) 
be the resulting sorted sequence; 

Step 4. for all j (1 < j < t) in parallel 

find the subscript dj ( j < dj < t) such that the angle determined by 
qj, and the negative direction of the x axis is as large as possible; 

Step 5. compute the prefix maximum of the values dj in Li, and set m(j) <— maxi< t<J _i{c! t }; 

Step 6> CZ?i * — ^li 

for all j (2 < j < t — 1) in parallel 

remove q 2 from CH 1 whenever dj < m(j); 

Before giving the proof of correctness of our algorithm, we need to take note of the 
following simple observation. The sorted sequence L\ of points obtained at the end of Step 
3 can be viewed as determining a polygonal line (termed a chain in [ 10 ]) joining p x and p 2 . 

It is easy to see that the convex hull CH of the set S of points is exactly the convex hull 
of the simple polygon P obtained by concatenating the polygonal lines X 2 , X 3 , and Z 4 , 
in this order. 

The following result argues for the correctness of our algorithm. 

Theorem 3. At the end of Step 6, CH\ contains the portion of the convex hull contained 
in S\. 

Proof. By the previous observation we only need show that the linked list CH\ obtained 
at the end of step 6 contains the restriction of the convex hull of P between p\ and p 2 . 
This follows from the following claim 

a point qj (2 < j < t — 1 ) of L\ belongs to CH if, and only if, dj > m(j). 
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First, let qj ( 2 < j < i — 1) in L\ belongs to the convex hull and let q ; and qi < (» < j < k) 
be its immediate neighbors on the convex hull. (We note that since qi and q t trivially 
belong to the convex hull, the points qi and qk are well defined.) Clearly, di—j and so 
m(j) — j < dj = fc, as claimed. 

Conversely, if some point qj in L\ does not belong to the convex hull then let qi and qk 
( i < k ) be the closest points on the convex hull, with qj lying on the chain from qi to g*. 
Since qi and qk are neighbors on the convex hull, we have d{—k\ furthermore, dj < k = 
and the conclusion follows. □ 

Next, we propose to show how Steps 1-6 above can be efficiently implemented on a 
2-d PARBS. More precisely, we assume a 2-d PARBS of size nm x n with 3 < m < 
n. Some of the Steps 1-6 in our algorithms need the whole PARBS while others can 
run on a subPARBS, as specified; the data movement necessary to conform to the input 
requirements of a specific step are ignored here; the reader can easily work out all the 
details. 

Step 1 can be implemented to run in 0(1) time on an n x n subPARBS since we only 
need compute maxx<j< n {z^} and mini <j<n{ z j} with z = x and z = y. 

Step 2 is demonstrated for Si only; computing Si with i = 2,3,4 is similar. All that 
is needed is to establish a subbus running through the whole of row 1. The processors 
storing pi and P 2 broadcast, in two computational steps, their Cartesian coordinates to all 
processors in row 1; every processor that stores a point in S\ marks itself. Thus Step 2 
runs in 0(1) time. 

Step 3 can be implemented as follows. First, all unmarked processors change the y 
coordinate of the point that they store to +oo. Now the sorting algorithm in [9] is invoked: 
this runs in 0(^^) and uses the whole PARBS. Note that at the end of Step 3, processors 
P(l, 1), P(l, 2),. . .,P(1, t) contain L in sorted order. 

Step 4 can be implemented to run in 0(1) time on an mn x n subPARBS as follows. 
Recall from Step 3 that, initially, for all i<j<t P(l, j) stores qj . For further reference, 
this subPARBS is further subdivided into subPARBS of size m x n as follows. The first 
m x n subPARBS involves the first m rows, the second the next m rows and so on. 
We establish vertical subbuses in each column and let P(l, j) broadcast the Cartesian 
coordinates of qj along the subbus in column j (1 < j < i). Next, establish horizontal 
subbuses running from P(m(j — 1) + 1, j) to P(m(j - 1) + 1,<) (1 < j < <). Note that these 
are precisely the first rows of our mxn subPARBS. For all j, P(m(j — 1) + 1 , j) broadcasts 
the Cartesian coordinates of qj eastbound on the horizontal subbus in row m(j — 1) + 1. 
Every processor P(m(j — 1) + l,fc) with j < k < t computes the angle specified in Step 4. 
Actually, computing the angle itself is not necessary, the tangent of the angle can be readily 
computing using two subtractions and a division. Now the maximum of all values in the 
first rows of these subPARBS can be computed in 0(^^) time using Algorithm Maximum 
developed in Section 2. It is easy to arrange for the maximum in row m(j — 1)4-1 to be 
sent back to P(l,j). This, clearly takes 0(1) time since only the appropriate subbuses 
have to be established and the information broadcast along them. 

Step 5 can be implemented to run on an n X n subPARBS by using Algorithm Prefix- 
Maximum discussed in Section 2. 
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Step 6 involves marking every P{1, j) that contains a point of the convex hull. After 
this is done, a horizontal subbus is established m row 1. Every marked processor sphts 
this bus and broadcasts its identity westbound on its own subbus. This, in fact creates 
the list CH\ as desired. Clearly, the running time of this step is 0(1). 

To summarize our discussion we state the following result. 

Theorem 4. The convex hull of a planar set of n points can be computed on an PARBS 
of size nm x n with 3 < m < n in 0(^^) time. □ 

In particular, if m = n 0-5 then we have the following result. 

Corollary 4,1. The convex hull of a planar set of n points can be computed in 0(1) time 
on an PARBS of size n 1 ' 5 X n. □ 


4 Conclusion 

A bus system that can be dynamically altered to suit communicational needs among co- 
operating processors is referred to as reconfigurable. In this paper we a fast adaptive 
algorithm to solve the planar convex hull problem. 

Specifically, we showed that computing the convex hull of a set of n points in the plane 
takes 0 ( ) on a 2-d PARBS of size nm x n with 3 < m < n. In particular, our result 
implies that the same problem can be solved in 0(1) time on a 2-d PARBS of size n 15 x n. 
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